Importance sampling for unbiased on-demand evaluation of knowledge base population
نویسندگان
چکیده
Knowledge base population (KBP) systems take in a large document corpus and extract entities and their relations. Thus far, KBP evaluation has relied on judgements on the pooled predictions of existing systems. We show that this evaluation is problematic: when a new system predicts a previously unseen relation, it is penalized even if it is correct. This leads to significant bias against new systems, which counterproductively discourages innovation in the field. Our first contribution is a new importance-sampling based evaluation which corrects for this bias by annotating a new system’s predictions ondemand via crowdsourcing. We show this eliminates bias and reduces variance using data from the 2015 TAC KBP task. Our second contribution is an implementation of our method made publicly available as an online KBP evaluation service. We pilot the service by testing diverse state-ofthe-art systems on the TAC KBP 2016 corpus and obtain accurate scores in a cost effective manner.
منابع مشابه
Application of adaptive sampling in fishery part 2: Truncated adaptive cluster sampling designs
There are some experiences that researcher come across quite number of time for very large networks in the initial samples such that they cannot finish the sampling procedure. Two solutions have been proposed and used by marine biologists which we discuss in this article: i) Adaptive cluster sampling based on order statistics with a stopping rule, ii) Restricted adaptive cluster sampling. Until...
متن کاملUnbiased Concurrent Evaluation on a Budget
Eliciting expert judgments for evaluating the performance of structured prediction systems (e.g., search engines, recommender systems) is labor-intensive and costly. This raises the question of how to get high-quality performance estimates given a relatively small budget of judgments. In this paper, we provide theoretically justified – yet highly practical and efficient – methods for selecting ...
متن کاملKnowledge Based System for the Evaluation of Safety and the Prevention of Railway Accidents
This paper describes a contribution to improving the usual safety analysis methods used in the certification of railway transport systems. The methodology is based on the complementary and simultaneous use of knowledge acquisition and machine learning. The purpose is contributed to the generation of new accident scenarios that could help experts to conclude on the safe character of a new rail t...
متن کاملRow and Column Elimination Sampling Design +1 and its Efficiencies
Extended Abstract. It is a traditional way in biological, sociological, agricultural and geological studies to partition a geographical area into quadrats and then take a sample of them by a particular sampling design. We study the relevant characteristic of quadrats to estimate a parameter of the population. We suppose that the variable of interest has a positive spatial autocorrelation. Sampl...
متن کاملAn Empirical Analysis of Likelihood-Weighting Simulation on a Large, Multiply-Connected Belief Network
We analyzed the convergence properties of likelihood weighting algorithms on a two-level, multiply connected, belief-network representation of the QMR knowledge base of internal medicine. Specifically, on two difficult diagnostic cases, we examined the effects of Markov blanket scoring, importance sampling, and self-importance sampling, demonstrating that the Markov blanket scoring and self-im...
متن کامل